Skip to content

Conversation

@marek-szews
Copy link

@marek-szews marek-szews commented Oct 14, 2025

Implements [gRFC A68]https://github.com/grpc/proposal/blob/master/A68-random-subsetting.md.

Note that this PR only implements the LB policy and does not implement the xDS integration specified here: https://github.com/grpc/proposal/blob/master/A68-random-subsetting.md#xds-integration

RELEASE NOTES:

  • balancer/randomsubsetting: Implementation of the random_subsetting LB policy

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 14, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@easwars easwars self-assigned this Oct 15, 2025
@easwars easwars self-requested a review October 15, 2025 19:17
@easwars easwars added Type: Feature New features or improvements in behavior Area: xDS Includes everything xDS related, including LB policies used with xDS. Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. and removed Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Oct 15, 2025
@easwars
Copy link
Contributor

easwars commented Oct 15, 2025

@marek-szews : Could you please get the tests to pass. Thanks.

}

// if someonw needs subsetSize == 1, he should use pick_first instead
if lbCfg.SubsetSize < 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gRFC only says the following:

    // The value is required and must be greater than 0.

Is the condition that the size be at least 2 specified somewhere else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the discussion at line 86

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 86 of what?

The only thing we should be checking is whether the subset size is greater than 0 and if not, returning an error from this method.

@github-actions
Copy link

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added stale and removed stale labels Oct 23, 2025
@dfawley
Copy link
Member

dfawley commented Oct 29, 2025

Hi @marek-szews - I think we're waiting on you to make some updates so we can continue the review. We can discuss Friday but I wanted to ping this in part to prevent the stale bot from closing it.

@arjan-bal arjan-bal added this to the 1.78 Release milestone Oct 30, 2025
Copy link
Author

@marek-szews marek-szews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rework done, tests passed

@dfawley dfawley assigned easwars and unassigned marek-szews Oct 31, 2025
@easwars easwars changed the title Implementation of A68 random_subsetting LB policy. balancer/randomsubsetting: Implementation of the LB policy Nov 13, 2025
@easwars easwars changed the title balancer/randomsubsetting: Implementation of the LB policy balancer/randomsubsetting: Implementation of the random_subsetting LB policy Nov 13, 2025
@easwars
Copy link
Contributor

easwars commented Nov 13, 2025

Rework done, tests passed

Tests are still failing. Please read this document for guidelines: https://github.com/grpc/grpc-go/blob/master/CONTRIBUTING.md

TL;dr

All tests need to be passing before your change can be merged. We recommend you run tests locally before creating your PR to catch breakages early on:

./scripts/vet.sh to catch vet errors.
go test -cpu 1,4 -timeout 7m ./... to run the tests.
go test -race -cpu 1,4 -timeout 7m ./... to run tests in race mode

@easwars easwars assigned marek-szews and unassigned easwars Nov 13, 2025
}

if b.cfg == nil || b.cfg.ChildPolicy.Name != lbCfg.ChildPolicy.Name {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this newline.

type lbConfig struct {
serviceconfig.LoadBalancingConfig `json:"-"`

SubsetSize uint64 `json:"subsetSize,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be uint32 as per the gRFC.

func (b *subsettingBalancer) prepareChildResolverState(s balancer.ClientConnState) resolver.State {
subsetSize := b.cfg.SubsetSize
endPoints := s.ResolverState.Endpoints
backendCount := len(endPoints)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please get rid of the backendCount local variable. The cide is more readable when used as len(endpoints).

// as described in A68: https://github.com/grpc/proposal/blob/master/A68-random-subsetting.md
func (b *subsettingBalancer) prepareChildResolverState(s balancer.ClientConnState) resolver.State {
subsetSize := b.cfg.SubsetSize
endPoints := s.ResolverState.Endpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/endPoints/endpoints/

endpoint is a single word.


// sort endpoint by hash
slices.SortFunc(endpointSet, func(a, b endpointWithHash) int {
return cmp.Compare(a.hash, b.hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use the cmp package in non-test code. The cmp package is only meant for test code. Please use the < operator here as the hash field is a uint64

// The backend receives a second request from the same client. This could happen if the client have 1 backend in READY
// state while the other are CONNECTING. In this case round_robbin will pick the same address twice.
// We are going to retry after short timeout.
time.Sleep(10 * time.Microsecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid sleeping in tests.

return clients, cancel
}

func checkRoundRobinRPCs(ctx context.Context, t *testing.T, clients []testgrpc.TestServiceClient, subsetSize int, maxDiff int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have something like this already:

func CheckRoundRobinRPCs(ctx context.Context, client testgrpc.TestServiceClient, addrs []resolver.Address) error {

Would you be able to use that instead? Thanks

Comment on lines +186 to +190
cancel := func() {
for _, cc := range ccs {
cc.Close()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use t.Cleanup here as well and avoid having to return a cancel function that the test needs to invoke.

maxDiff int
}{
{
name: "backends could be evenly distributed between small number of clients",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see here for guidelines on how best to name subtests: https://google.github.io/styleguide/go/decisions#subtest-names

TL;dr
Think of subtest names more like a function identifier than a prose description.

The test runner replaces spaces with underscores, and escapes non-printing characters. To ensure accurate correlation between test logs and source code, it is recommended to avoid using these characters in subtest names.

func (bb) Build(cc balancer.ClientConn, bOpts balancer.BuildOptions) balancer.Balancer {
b := &subsettingBalancer{
Balancer: gracefulswitch.NewBalancer(cc, bOpts),
hashf: xxhash.NewWithSeed(uint64(time.Now().UnixNano())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to use a random number here as specified in the gRFC. Also, it would be useful to be able to override the random value in tests for predictability. Here is an example of how to do what I'm suggesting:

var randInt64n = rand.Int64N

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. Type: Feature New features or improvements in behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants